最近,基于得分的扩散模型在MRI重建中表现出令人满意的性能。这些方法中的大多数都需要大量完全采样的MRI数据作为培训集,有时在实践中很难获得。本文提出了用于MRI重建的完全采样的基于无DATA的分数扩散模型,该模型以不足的采样数据以自我监督的方式学习了完全采样的MR图像。具体而言,我们首先通过贝叶斯深度学习从未采样的数据中推断出完全采样的MR图像分布,然后通过训练分数函数来扰动数据分布并近似其概率密度梯度。利用学到的分数函数为先验,我们可以通过执行条件的Langevin Markov链蒙特卡洛(MCMC)采样来重建MR图像。公共数据集的实验表明,所提出的方法优于现有的自我监督的MRI重建方法,并与常规(完全采样的数据训练)基于得分的扩散方法实现可比性的性能。
translated by 谷歌翻译
最近,未经训练的神经网络(UNNS)显示了在随机采样轨迹上对MR图像重建的令人满意的性能,而无需使用其他全面采样训练数据。但是,现有的基于UNN的方法并未完全使用MR图像物理先验,导致某些常见情况(例如部分傅立叶,常规采样等)的性能差,并且缺乏重建准确性的理论保证。为了弥合这一差距,我们使用特殊设计的UNN提出了一种保障的K空间插值方法,该方法使用特殊设计的UNN,该方法由MR图像的三个物理先验(或K空间数据)驱动,包括稀疏,线圈灵敏度平稳性和相位平滑度。我们还证明,所提出的方法保证了插值K空间数据准确性的紧密界限。最后,消融实验表明,所提出的方法比现有传统方法更准确地表征了MR图像的物理先验。此外,在一系列常用的采样轨迹下,实验还表明,所提出的方法始终优于传统的平行成像方法和现有的UNN,甚至超过了最先进的监督训练的K空间深度学习方法案例。
translated by 谷歌翻译
降解扩散概率模型(DDPM)已显示在MRI重建中具有出色的性能。从连续的随机微分方程(SDE)的角度来看,DDPM的反向过程可被视为最大化重建的MR图像的能量,从而导致SDE序列发散。因此,提出了用于MRI重建的修改高频DDPM模型。从其连续的SDE观点(称为高频空间SDE)(HFS-SDE),MR图像的能量浓缩低频部分不再得到放大,并且扩散过程更多地集中在获取高频的先验信息上。它不仅提高了扩散模型的稳定性,而且还提供了更好地恢复高频细节的可能性。公开FastMRI数据集的实验表明,我们提出的HFS-SDE优于DDPM驱动的VP-SDE,有监督的深度学习方法和传统的平行成像方法,就稳定性和重建精度而言。
translated by 谷歌翻译
由低级别正则化驱动的深度学习方法在动态磁共振(MR)成像中实现了有吸引力的性能。但是,这些方法中的大多数代表了手工制作的核标准的低级别先验,该规范无法通过固定的正则化参数准确地近似整个数据集的低排名先验。在本文中,我们提出了一种学习动态MR成像的低级方法。特别是,我们将部分可分离(PS)模型的半季度分裂方法(HQS)算法传输到网络中,其中低级别以可学习的空空间变换自适应地表征。心脏CINE数据集的实验表明,所提出的模型的表现优于最新的压缩传感(CS)方法和现有的深度学习方法,既有定量和质量上的深度学习方法。
translated by 谷歌翻译
最近,模型驱动的深度学习通过用网络模块替换符号器的一阶信息(即(子)梯度或近端运算符)来拓展到级联网络中的一定迭代算法,该算法呈现出更可说明的与常见的数据驱动网络相比,可以预测。相反,理论上,不一定存在这样的功能常规程序,其一级信息与替换的网络模块匹配,这意味着网络输出可能不被原始正则化模型覆盖。此外,到目前为止,在现实假设下,也没有保证展开网络的全球收敛性和鲁棒性(规律性)。为了弥合这一差距,本文建议在展开网络上提出保障方法。具体而言,专注于加速MRI,我们展开了一个零阶算法,网络模块代表常规器本身,使得网络输出可以仍然被正则化模型覆盖。此外,受到深度均衡模型的理想的启发,在反向化之前,我们执行了展开的迭代网络,以收敛到一个固定点,以确保收敛。如果测量数据包含噪声,我们证明了所提出的网络对嘈杂干扰具有强大。最后,数值实验表明,所提出的网络始终如一地优于最先进的MRI重建方法,包括传统的正规化方法和其他深度学习方法。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.
translated by 谷歌翻译
Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples. However, their iterative refinement process in high-dimensional data space results in slow inference speed, which restricts their application in real-time systems. Previous works have explored speeding up by minimizing the number of inference steps but at the cost of sample quality. In this work, to improve the inference speed for DDPM-based TTS model while achieving high sample quality, we propose ResGrad, a lightweight diffusion model which learns to refine the output spectrogram of an existing TTS model (e.g., FastSpeech 2) by predicting the residual between the model output and the corresponding ground-truth speech. ResGrad has several advantages: 1) Compare with other acceleration methods for DDPM which need to synthesize speech from scratch, ResGrad reduces the complexity of task by changing the generation target from ground-truth mel-spectrogram to the residual, resulting into a more lightweight model and thus a smaller real-time factor. 2) ResGrad is employed in the inference process of the existing TTS model in a plug-and-play way, without re-training this model. We verify ResGrad on the single-speaker dataset LJSpeech and two more challenging datasets with multiple speakers (LibriTTS) and high sampling rate (VCTK). Experimental results show that in comparison with other speed-up methods of DDPMs: 1) ResGrad achieves better sample quality with the same inference speed measured by real-time factor; 2) with similar speech quality, ResGrad synthesizes speech faster than baseline methods by more than 10 times. Audio samples are available at https://resgrad1.github.io/.
translated by 谷歌翻译
Crowdsourcing, in which human intelligence and productivity is dynamically mobilized to tackle tasks too complex for automation alone to handle, has grown to be an important research topic and inspired new businesses (e.g., Uber, Airbnb). Over the years, crowdsourcing has morphed from providing a platform where workers and tasks can be matched up manually into one which leverages data-driven algorithmic management approaches powered by artificial intelligence (AI) to achieve increasingly sophisticated optimization objectives. In this paper, we provide a survey presenting a unique systematic overview on how AI can empower crowdsourcing - which we refer to as AI-Empowered Crowdsourcing(AIEC). We propose a taxonomy which divides algorithmic crowdsourcing into three major areas: 1) task delegation, 2) motivating workers, and 3) quality control, focusing on the major objectives which need to be accomplished. We discuss the limitations and insights, and curate the challenges of doing research in each of these areas to highlight promising future research directions.
translated by 谷歌翻译
Fine-grained classification and counting of bone marrow erythroid cells are vital for evaluating the health status and formulating therapeutic schedules for leukemia or hematopathy. Due to the subtle visual differences between different types of erythroid cells, it is challenging to apply existing image-based deep learning models for fine-grained erythroid cell classification. Moreover, there is no large open-source datasets on erythroid cells to support the model training. In this paper, we introduce BMEC (Bone Morrow Erythroid Cells), the first large fine-grained image dataset of erythroid cells, to facilitate more deep learning research on erythroid cells. BMEC contains 5,666 images of individual erythroid cells, each of which is extracted from the bone marrow erythroid cell smears and professionally annotated to one of the four types of erythroid cells. To distinguish the erythroid cells, one key indicator is the cell shape which is closely related to the cell growth and maturation. Therefore, we design a novel shape-aware image classification network for fine-grained erythroid cell classification. The shape feature is extracted from the shape mask image and aggregated to the raw image feature with a shape attention module. With the shape-attended image feature, our network achieved superior classification performance (81.12\% top-1 accuracy) on the BMEC dataset comparing to the baseline methods. Ablation studies also demonstrate the effectiveness of incorporating the shape information for the fine-grained cell classification. To further verify the generalizability of our method, we tested our network on two additional public white blood cells (WBC) datasets and the results show our shape-aware method can generally outperform recent state-of-the-art works on classifying the WBC. The code and BMEC dataset can be found on https://github.com/wangye8899/BMEC.
translated by 谷歌翻译